12 research outputs found

    Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

    Full text link
    Meta reinforcement learning (Meta RL) has been amply explored to quickly learn an unseen task by transferring previously learned knowledge from similar tasks. However, most state-of-the-art algorithms require the meta-training tasks to have a dense coverage on the task distribution and a great amount of data for each of them. In this paper, we propose MetaDreamer, a context-based Meta RL algorithm that requires less real training tasks and data by doing meta-imagination and MDP-imagination. We perform meta-imagination by interpolating on the learned latent context space with disentangled properties, as well as MDP-imagination through the generative world model where physical knowledge is added to plain VAE networks. Our experiments with various benchmarks show that MetaDreamer outperforms existing approaches in data efficiency and interpolated generalization

    Synthesis and Evaluation of Automated Vehicles

    Full text link
    This dissertation focuses on the synthesis of a decision-making system for Automated Vehicles (AVs), and then evaluates the safety and robustness of the system with an eye toward improving the system design. We begin with a synthesis of an AV’s decision-making system in a specific driving environment. We model the environment as a Markov Decision Process (MDP), with the goal of determining the optimal strategy (that is, policy) for this particular MDP. We propose a novel Reinforcement Learning (RL) method using model-based exploration. This method allows the training agent to explore the MDP state space by maximizing the notion of an agent’s surprise about its experiences via intrinsic motivation. The optimal strategy will be deemed to be a global-optimal policy by which the AV can travel more efficiently. We then evaluate the decision-making system in a naturalistic driving environment. We focus on lane change maneuvers, modeling the differences between AVs and Human-controlled Vehicles (HVs) using the Safety Pilot Model Deployment Program’s naturalistic driving data. The probability of crashes serves as the primary metric for evaluating the safety of AV systems. In general, testing a system in a naturalistic driving environment is time-consuming and not cost-effective. To overcome this problem, we propose an accelerated evaluation method called Subset Simulation (SS), which can significantly reduce evaluation time and beat the baseline Importance Sampling (IS) method. This technique is not only capable of evaluating a system with a high-dimension state space, but also has the potential to conduct evaluations of more complicated systems (e.g., object detection systems). The SS method is limited, however, in that the “danger regions” are searched only as the test procedure unfolds. If the environmental statistics change, the crash rate cannot be estimated accurately. Therefore, we prefer to evaluate the decision-making system without including the environmental statistics. To this end, we propose an evaluation method based on the two-player Markov game. We introduce an attacker into the environment which keeps “attacking” the AV in a socially acceptable fashion. The attacker tries to lure the AV into AV-responsible crashes (as opposed to “crazy” crashes). Once the attacker has completed training, the AV is evaluated by introducing the attacker. The crash rate of the system then becomes 50 times greater in the environment with the attacker, which allows the system to register fatal flaws in the original training environment design. Introducing attackers capable of generating socially acceptable attacks makes the behavior of the surrounding vehicles more diverse. Our goal is to improve the original policy so as to design a safe and robust decision-making system under situations with different types of drivers in the environment, different traffic densities, and differing numbers of total surrounding vehicles. We tackle this problem by implementing the state-of-the-art Meta-Reinforcement Learning (MRL) method to train an agent to quickly adapt to different environments with limited data. The MRL-trained policy can significantly decrease the crash rate with a small amount of data across different environments. This technique has tremendous potential for helping the AV quickly adapt to varying conditions such as different locations, weather, and lighting.PHDMechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168058/1/songanz_1.pd

    Virtual Line Shafting-Based Total-Amount Coordinated Control of Multi-Motor Traction Power

    No full text
    This paper investigates a virtual line shafting-based total-amount coordinated control method of multi-motor traction power to solve the traffic safety problem caused by train traction power loss. This method considers the total amount instead of the synchronous control amongst single motors in a multi-motor control system. Firstly, a block diagram of the proposed method is built. Secondly, on the basis of this diagram, an accurate system model with parameter perturbations is constructed. Thirdly, a virtual controller is designed to quickly adjust the output torque of the virtual motor and to realise a tracking control of the reference torque. A total-amount coordinated control strategy based on the integral sliding mode is also designed to keep the total traction power of the multi-motor system constant under uncertain and unknown disturbances. Lyapunov stability theory is used to prove the system stability. The simulation and experiment results verify the effectiveness of the virtual controller and the total-amount coordinated control strategy in guaranteeing system robustness under disturbances and parameter perturbations

    Fault-Tolerant Control of a Nonlinear System Actuator Fault Based on Sliding Mode Control

    No full text
    This paper presents a fault-tolerant control scheme for a class of nonlinear systems with actuator faults and unknown input disturbances. First, the sliding mode control law is designed based on the reaching law method. Then, in view of unpredictable state variables and unknown information in the control law, the original system is transformed into two subsystems through a coordinate transformation. One subsystem only has actuator faults, and the other subsystem has both actuator faults and disturbances. A sliding mode observer is designed for the two subsystems, respectively, and the equivalence principle of the sliding mode variable structure is used to realize the accurate reconstruction of the actuator faults and disturbances. Finally, the observation value and the reconstruction value are used to carry out an online adjustment to the designed sliding mode control law, and fault-tolerant control of the system is realized. The simulation results are presented to demonstrate the approach

    Consistent Total Traction Torque-Oriented Coordinated Control of Multimotors with Input Saturation for Heavy-Haul Locomotives

    No full text
    In the coordinated control of multiple motors for heavy-haul locomotives, the input value for a motor often exceeds its maximum allowable input value, resulting in the saturation problem. A traction total-amount coordinated tracking control (TACTC) strategy is proposed to address the input saturation of heavy-haul locomotives driven by multiple motors. This strategy reduces control input and suppresses input saturation. First, a multimotor traction model with uncertain parameter perturbations and external disturbances was established. Next, a sliding-mode disturbance observer (SMDO) was designed to reduce the sliding-mode switching gain, thereby decreasing the control input. An auxiliary anti-windup (AW) system was used to weaken the effect of input saturation on tracking performance. Then, the observed value and auxiliary state were fed back to the sliding-mode controller to design a TACTC protocol and ensure that the total amount of traction torque follows the desired traction characteristic curve. Finally, the Matlab/Simulink simulation and RT-Lab semiphysical experiment results show that the proposed strategy can effectively suppress the input saturation problem of multimotor coordinated control

    HPV Infection and Prognostic Factors of Tongue Squamous Cell Carcinoma in Different Ethnic Groups from Geographically Closed Cohort in Xinjiang, China

    No full text
    Background. The effect of HPV infection status and ethnic differences on the prognosis of tongue squamous cell carcinoma in Xinjiang presents an interesting set of conditions that has yet to be studied. Methods. A comprehensive analysis of clinical data was undertaken for a cohort consisting of 63 patients with tongue squamous cell carcinoma recruited from three ethnic groups in Xinjiang. PCR was used for the detection of HPV16 and HPV18 infections. Kaplan-Meier survival analysis was used for analyzing survival outcome in addition to the assessment of other prognostic factors. Results. The overall infection rate for HPV was 28.6% (18/63); the 5-year survival rate among the HPV-positive patients was 47.8% and 30.3% for HPV-negative patients. The survival rate for HPV-positive patients who received radiotherapy and chemotherapy was better than for those who did not receive radiotherapy and chemotherapy. N staging and HPV infection were found to be two independent and significant prognostic factors. Conclusion. HPV-positive patients with tongue squamous cell carcinoma are more sensitive to chemotherapy. Higher N staging indicates poor prognosis
    corecore